Overview
Brought to you by YData
Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 2430251 |
| Missing cells | 2615978 |
| Missing cells (%) | 7.2% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 278.1 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Numeric | 12 |
|---|---|
| DateTime | 2 |
| Categorical | 1 |
arrival_delay_m is highly overall correlated with prev_arrival_delay_m and 2 other fields | High correlation |
max_station_number is highly overall correlated with stop_number | High correlation |
prev_arrival_delay_m is highly overall correlated with arrival_delay_m and 2 other fields | High correlation |
prev_departure_delay_m is highly overall correlated with arrival_delay_m and 2 other fields | High correlation |
station_progress is highly overall correlated with stop_number | High correlation |
stop_number is highly overall correlated with max_station_number and 1 other fields | High correlation |
weighted_avg_prev_delay is highly overall correlated with arrival_delay_m and 2 other fields | High correlation |
transformed_info_message is highly imbalanced (52.4%) | Imbalance |
IBNR has 121088 (5.0%) missing values | Missing |
arrival_plan has 831630 (34.2%) missing values | Missing |
departure_plan has 831630 (34.2%) missing values | Missing |
arrival_delay_m has 831630 (34.2%) missing values | Missing |
arrival_delay_m has 1043247 (42.9%) zeros | Zeros |
prev_arrival_delay_m has 1936581 (79.7%) zeros | Zeros |
prev_departure_delay_m has 1880371 (77.4%) zeros | Zeros |
weighted_avg_prev_delay has 1467334 (60.4%) zeros | Zeros |
Reproduction
| Analysis started | 2024-11-19 22:54:30.821992 |
|---|---|
| Analysis finished | 2024-11-19 22:56:56.855073 |
| Duration | 2 minutes and 26.03 seconds |
| Software version | ydata-profiling vv4.11.0 |
| Download configuration | config.json |
Variables
ID_Base
Real number (ℝ)
| Distinct | 40191 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -2.4653179 × 1016 |
| Minimum | -9.223177 × 1018 |
|---|---|
| Maximum | 9.2217322 × 1018 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 1222099 |
| Negative (%) | 50.3% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | -9.223177 × 1018 |
|---|---|
| 5-th percentile | -8.3267611 × 1018 |
| Q1 | -4.5909339 × 1018 |
| median | -5.8750657 × 1016 |
| Q3 | 4.5603096 × 1018 |
| 95-th percentile | 8.3397438 × 1018 |
| Maximum | 9.2217322 × 1018 |
| Range | -1.8348813 × 1015 |
| Interquartile range (IQR) | 9.1512435 × 1018 |
Descriptive statistics
| Standard deviation | 5.3244526 × 1018 |
|---|---|
| Coefficient of variation (CV) | -215.97428 |
| Kurtosis | -1.1923179 |
| Mean | -2.4653179 × 1016 |
| Median Absolute Deviation (MAD) | 4.5753167 × 1018 |
| Skewness | 0.010512774 |
| Sum | 1.6107104 × 1018 |
| Variance | 2.8349795 × 1037 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2.256484864 × 1018 | 413 | < 0.1% |
| 8.467202706 × 1018 | 390 | < 0.1% |
| 8.668076605 × 1018 | 382 | < 0.1% |
| -7.996941865 × 1018 | 381 | < 0.1% |
| -2.094717035 × 1018 | 345 | < 0.1% |
| -1.78380972 × 1017 | 338 | < 0.1% |
| 2.688663988 × 1018 | 337 | < 0.1% |
| -8.560851479 × 1018 | 321 | < 0.1% |
| -6.831600949 × 1018 | 309 | < 0.1% |
| -6.568589303 × 1018 | 309 | < 0.1% |
| Other values (40181) | 2426726 |
| Value | Count | Frequency (%) |
| -9.223176951 × 1018 | 5 | < 0.1% |
| -9.222587614 × 1018 | 17 | < 0.1% |
| -9.222235769 × 1018 | 42 | < 0.1% |
| -9.221813993 × 1018 | 202 | |
| -9.221229322 × 1018 | 5 | < 0.1% |
| -9.221103336 × 1018 | 91 | |
| -9.220755073 × 1018 | 33 | < 0.1% |
| -9.220659516 × 1018 | 110 | |
| -9.220172063 × 1018 | 20 | < 0.1% |
| -9.219634608 × 1018 | 18 | < 0.1% |
| Value | Count | Frequency (%) |
| 9.221732242 × 1018 | 81 | |
| 9.221055243 × 1018 | 53 | < 0.1% |
| 9.220892138 × 1018 | 15 | < 0.1% |
| 9.22087069 × 1018 | 54 | < 0.1% |
| 9.220854484 × 1018 | 7 | < 0.1% |
| 9.219893508 × 1018 | 144 | |
| 9.219684671 × 1018 | 5 | < 0.1% |
| 9.219589171 × 1018 | 14 | < 0.1% |
| 9.218406789 × 1018 | 42 | < 0.1% |
| 9.218312429 × 1018 | 56 | < 0.1% |
ID_Timestamp
Real number (ℝ)
| Distinct | 10109 |
|---|---|
| Distinct (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4071086 × 109 |
| Minimum | 2.4070319 × 109 |
|---|---|
| Maximum | 2.4071424 × 109 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 2.4070319 × 109 |
|---|---|
| 5-th percentile | 2.4070807 × 109 |
| Q1 | 2.4070914 × 109 |
| median | 2.4071109 × 109 |
| Q3 | 2.4071222 × 109 |
| 95-th percentile | 2.4071415 × 109 |
| Maximum | 2.4071424 × 109 |
| Range | 110501 |
| Interquartile range (IQR) | 30843 |
Descriptive statistics
| Standard deviation | 21563.79 |
|---|---|
| Coefficient of variation (CV) | 8.9583785 × 10-6 |
| Kurtosis | -0.02141794 |
| Mean | 2.4071086 × 109 |
| Median Absolute Deviation (MAD) | 19381 |
| Skewness | -0.3486081 |
| Sum | 5.849878 × 1015 |
| Variance | 4.6499703 × 108 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2407080833 | 835 | < 0.1% |
| 2407090833 | 835 | < 0.1% |
| 2407091633 | 823 | < 0.1% |
| 2407100833 | 814 | < 0.1% |
| 2407081633 | 809 | < 0.1% |
| 2407111633 | 799 | < 0.1% |
| 2407120733 | 796 | < 0.1% |
| 2407101633 | 795 | < 0.1% |
| 2407110833 | 792 | < 0.1% |
| 2407120833 | 789 | < 0.1% |
| Other values (10099) | 2422164 |
| Value | Count | Frequency (%) |
| 2407031857 | 3 | < 0.1% |
| 2407040236 | 24 | < 0.1% |
| 2407040245 | 11 | < 0.1% |
| 2407040253 | 2 | < 0.1% |
| 2407040302 | 19 | < 0.1% |
| 2407040303 | 6 | < 0.1% |
| 2407040312 | 20 | < 0.1% |
| 2407040313 | 30 | |
| 2407040314 | 1 | < 0.1% |
| 2407040317 | 65 |
| Value | Count | Frequency (%) |
| 2407142358 | 1 | < 0.1% |
| 2407142354 | 4 | < 0.1% |
| 2407142353 | 6 | < 0.1% |
| 2407142352 | 3 | < 0.1% |
| 2407142351 | 25 | |
| 2407142350 | 9 | < 0.1% |
| 2407142349 | 3 | < 0.1% |
| 2407142348 | 29 | |
| 2407142347 | 6 | < 0.1% |
| 2407142346 | 23 |
stop_number
Real number (ℝ)
High correlation 
| Distinct | 59 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.585509 |
| Minimum | 1 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 9 |
| Q3 | 15 |
| 95-th percentile | 25 |
| Maximum | 59 |
| Range | 58 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 7.4487918 |
|---|---|
| Coefficient of variation (CV) | 0.70367819 |
| Kurtosis | 0.73941142 |
| Mean | 10.585509 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 1.0059828 |
| Sum | 25725444 |
| Variance | 55.4845 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2 | 193286 | 8.0% |
| 3 | 183914 | 7.6% |
| 4 | 175289 | 7.2% |
| 5 | 164351 | 6.8% |
| 6 | 152967 | 6.3% |
| 7 | 141799 | 5.8% |
| 8 | 131737 | 5.4% |
| 9 | 123448 | 5.1% |
| 10 | 114022 | 4.7% |
| 11 | 104747 | 4.3% |
| Other values (49) | 944691 |
| Value | Count | Frequency (%) |
| 1 | 40018 | 1.6% |
| 2 | 193286 | |
| 3 | 183914 | |
| 4 | 175289 | |
| 5 | 164351 | |
| 6 | 152967 | |
| 7 | 141799 | |
| 8 | 131737 | |
| 9 | 123448 | |
| 10 | 114022 |
| Value | Count | Frequency (%) |
| 59 | 33 | < 0.1% |
| 58 | 33 | < 0.1% |
| 57 | 33 | < 0.1% |
| 56 | 34 | < 0.1% |
| 55 | 31 | < 0.1% |
| 54 | 42 | < 0.1% |
| 53 | 60 | |
| 52 | 59 | |
| 51 | 73 | |
| 50 | 120 |
IBNR
Real number (ℝ)
Missing 
| Distinct | 5198 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 121088 |
| Missing (%) | 5.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8020144.7 |
| Minimum | 8000001 |
|---|---|
| Maximum | 8099506 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 8000001 |
|---|---|
| 5-th percentile | 8000208 |
| Q1 | 8002101 |
| median | 8004483 |
| Q3 | 8011755 |
| 95-th percentile | 8089091 |
| Maximum | 8099506 |
| Range | 99505 |
| Interquartile range (IQR) | 9654 |
Descriptive statistics
| Standard deviation | 32816.948 |
|---|---|
| Coefficient of variation (CV) | 0.0040918149 |
| Kurtosis | 0.59370912 |
| Mean | 8020144.7 |
| Median Absolute Deviation (MAD) | 2937 |
| Skewness | 1.5822273 |
| Sum | 1.8519821 × 1013 |
| Variance | 1.0769521 × 109 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 8089028 | 9069 | 0.4% |
| 8004128 | 8618 | 0.4% |
| 8098549 | 7543 | 0.3% |
| 8004129 | 7432 | 0.3% |
| 8004135 | 7432 | 0.3% |
| 8004131 | 7425 | 0.3% |
| 8004132 | 7424 | 0.3% |
| 8089047 | 7250 | 0.3% |
| 8004136 | 7121 | 0.3% |
| 8004179 | 6589 | 0.3% |
| Other values (5188) | 2233260 | |
| (Missing) | 121088 | 5.0% |
| Value | Count | Frequency (%) |
| 8000001 | 490 | |
| 8000002 | 1 | < 0.1% |
| 8000004 | 347 | < 0.1% |
| 8000007 | 347 | < 0.1% |
| 8000009 | 455 | |
| 8000010 | 365 | < 0.1% |
| 8000011 | 570 | |
| 8000012 | 429 | |
| 8000013 | 977 | |
| 8000014 | 427 |
| Value | Count | Frequency (%) |
| 8099506 | 197 | < 0.1% |
| 8098553 | 4453 | |
| 8098549 | 7543 | |
| 8098360 | 1 | < 0.1% |
| 8098348 | 192 | < 0.1% |
| 8098263 | 6323 | |
| 8098205 | 2651 | 0.1% |
| 8098193 | 462 | < 0.1% |
| 8098147 | 2544 | 0.1% |
| 8098105 | 4921 |
long
Real number (ℝ)
| Distinct | 3125 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.171016 |
| Minimum | 0.834032 |
|---|---|
| Maximum | 14.982644 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 0.834032 |
|---|---|
| 5-th percentile | 6.852636 |
| Q1 | 8.364945 |
| median | 9.918336 |
| Q3 | 12.201874 |
| 95-th percentile | 13.553202 |
| Maximum | 14.982644 |
| Range | 14.148612 |
| Interquartile range (IQR) | 3.836929 |
Descriptive statistics
| Standard deviation | 2.3114801 |
|---|---|
| Coefficient of variation (CV) | 0.22726147 |
| Kurtosis | -1.1159849 |
| Mean | 10.171016 |
| Median Absolute Deviation (MAD) | 1.777579 |
| Skewness | 0.11707699 |
| Sum | 24718122 |
| Variance | 5.3429402 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 11.536537 | 8022 | 0.3% |
| 11.575386 | 7373 | 0.3% |
| 11.583234 | 7368 | 0.3% |
| 11.548572 | 7363 | 0.3% |
| 11.565619 | 7329 | 0.3% |
| 13.283966 | 7075 | 0.3% |
| 11.593049 | 6923 | 0.3% |
| 11.519245 | 6512 | 0.3% |
| 11.503669 | 6498 | 0.3% |
| 11.604971 | 6132 | 0.3% |
| Other values (3115) | 2359656 |
| Value | Count | Frequency (%) |
| 0.834032 | 725 | |
| 0.896632 | 710 | |
| 6.070715 | 1427 | |
| 6.07384 | 894 | |
| 6.074485 | 1049 | |
| 6.08378 | 724 | |
| 6.091499 | 441 | < 0.1% |
| 6.094486 | 1279 | |
| 6.097265 | 807 | |
| 6.098877 | 719 |
| Value | Count | Frequency (%) |
| 14.982644 | 758 | |
| 14.97908 | 189 | < 0.1% |
| 14.936008 | 1 | < 0.1% |
| 14.930408 | 727 | |
| 14.902088 | 248 | < 0.1% |
| 14.889318 | 738 | |
| 14.825531 | 738 | |
| 14.825234 | 738 | |
| 14.805774 | 41 | < 0.1% |
| 14.706775 | 259 | < 0.1% |
lat
Real number (ℝ)
| Distinct | 3130 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 50.965929 |
| Minimum | 47.417954 |
|---|---|
| Maximum | 55.021381 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 47.417954 |
|---|---|
| 5-th percentile | 48.047217 |
| Q1 | 49.382114 |
| median | 51.047991 |
| Q3 | 52.500737 |
| 95-th percentile | 53.965491 |
| Maximum | 55.021381 |
| Range | 7.6034266 |
| Interquartile range (IQR) | 3.1186226 |
Descriptive statistics
| Standard deviation | 1.91654 |
|---|---|
| Coefficient of variation (CV) | 0.037604338 |
| Kurtosis | -0.96141838 |
| Mean | 50.965929 |
| Median Absolute Deviation (MAD) | 1.4719641 |
| Skewness | -0.00072081555 |
| Sum | 1.2386 × 108 |
| Variance | 3.6731256 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 48.142623 | 8022 | 0.3% |
| 48.137048 | 7373 | 0.3% |
| 48.134202 | 7368 | 0.3% |
| 48.141969 | 7363 | 0.3% |
| 48.139452 | 7329 | 0.3% |
| 52.500737 | 7075 | 0.3% |
| 48.129168 | 6923 | 0.3% |
| 48.14354 | 6512 | 0.3% |
| 48.144371 | 6498 | 0.3% |
| 48.12744 | 6132 | 0.3% |
| Other values (3120) | 2359656 |
| Value | Count | Frequency (%) |
| 47.4179544 | 718 | |
| 47.456591 | 213 | < 0.1% |
| 47.5058367 | 1496 | |
| 47.513241 | 428 | < 0.1% |
| 47.5251713 | 730 | |
| 47.543785 | 723 | |
| 47.544341 | 49 | < 0.1% |
| 47.547219 | 723 | |
| 47.54792 | 748 | |
| 47.549143 | 729 |
| Value | Count | Frequency (%) |
| 55.021381 | 749 | |
| 55.019862 | 751 | |
| 55.017947 | 733 | |
| 55.01765 | 736 | |
| 55.0149 | 725 | |
| 55.012455 | 744 | |
| 55.010432 | 765 | |
| 55.008077 | 731 | |
| 55.001937 | 697 | |
| 54.988543 | 753 |
arrival_plan
Date
Missing 
| Distinct | 10081 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 831630 |
| Missing (%) | 34.2% |
| Memory size | 18.5 MiB |
| Minimum | 2024-07-07 23:37:00 |
|---|---|
| Maximum | 2024-07-14 23:58:00 |
departure_plan
Date
Missing 
| Distinct | 10077 |
|---|---|
| Distinct (%) | 0.6% |
| Missing | 831630 |
| Missing (%) | 34.2% |
| Memory size | 18.5 MiB |
| Minimum | 2024-07-08 00:00:00 |
|---|---|
| Maximum | 2024-07-14 23:58:00 |
arrival_delay_m
Real number (ℝ)
High correlation  Missing  Zeros 
| Distinct | 110 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 831630 |
| Missing (%) | 34.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.2553144 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1043247 |
| Zeros (%) | 42.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 6 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.4423568 |
|---|---|
| Coefficient of variation (CV) | 2.7422268 |
| Kurtosis | 99.038054 |
| Mean | 1.2553144 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 7.386492 |
| Sum | 2006772 |
| Variance | 11.849821 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1043247 | |
| 1 | 218086 | 9.0% |
| 2 | 112971 | 4.6% |
| 3 | 69359 | 2.9% |
| 4 | 38702 | 1.6% |
| 5 | 26169 | 1.1% |
| 6 | 18020 | 0.7% |
| 7 | 12762 | 0.5% |
| 8 | 10065 | 0.4% |
| 9 | 8115 | 0.3% |
| Other values (100) | 41125 | 1.7% |
| (Missing) | 831630 |
| Value | Count | Frequency (%) |
| 0 | 1043247 | |
| 1 | 218086 | 9.0% |
| 2 | 112971 | 4.6% |
| 3 | 69359 | 2.9% |
| 4 | 38702 | 1.6% |
| 5 | 26169 | 1.1% |
| 6 | 18020 | 0.7% |
| 7 | 12762 | 0.5% |
| 8 | 10065 | 0.4% |
| 9 | 8115 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 157 | 1 | < 0.1% |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 2 | < 0.1% |
| 120 | 1 | < 0.1% |
| 117 | 1 | < 0.1% |
| 116 | 1 | < 0.1% |
| 110 | 7 |
transformed_info_message
Categorical
Imbalance 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 18.5 MiB |
| No message | |
|---|---|
| Information | |
| Bauarbeiten | 140053 |
| Störung | 121627 |
| Großstörung | 6422 |
Length
| Max length | 11 |
|---|---|
| Median length | 10 |
| Mean length | 10.019747 |
| Min length | 7 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | No message |
|---|---|
| 2nd row | No message |
| 3rd row | No message |
| 4th row | No message |
| 5th row | No message |
Common Values
| Value | Count | Frequency (%) |
| No message | 1895754 | |
| Information | 266395 | 11.0% |
| Bauarbeiten | 140053 | 5.8% |
| Störung | 121627 | 5.0% |
| Großstörung | 6422 | 0.3% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| no | 1895754 | |
| message | 1895754 | |
| information | 266395 | 6.2% |
| bauarbeiten | 140053 | 3.2% |
| störung | 121627 | 2.8% |
| großstörung | 6422 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 4071614 | |
| s | 3797930 | |
| a | 2442255 | |
| o | 2434966 | |
| m | 2162149 | |
| g | 2023803 | |
| N | 1895754 | |
| 1895754 | ||
| n | 800892 | 3.3% |
| r | 540919 | 2.2% |
| Other values (11) | 2284463 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 24350499 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 4071614 | |
| s | 3797930 | |
| a | 2442255 | |
| o | 2434966 | |
| m | 2162149 | |
| g | 2023803 | |
| N | 1895754 | |
| 1895754 | ||
| n | 800892 | 3.3% |
| r | 540919 | 2.2% |
| Other values (11) | 2284463 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 24350499 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 4071614 | |
| s | 3797930 | |
| a | 2442255 | |
| o | 2434966 | |
| m | 2162149 | |
| g | 2023803 | |
| N | 1895754 | |
| 1895754 | ||
| n | 800892 | 3.3% |
| r | 540919 | 2.2% |
| Other values (11) | 2284463 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 24350499 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 4071614 | |
| s | 3797930 | |
| a | 2442255 | |
| o | 2434966 | |
| m | 2162149 | |
| g | 2023803 | |
| N | 1895754 | |
| 1895754 | ||
| n | 800892 | 3.3% |
| r | 540919 | 2.2% |
| Other values (11) | 2284463 |
prev_arrival_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 103 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.72297676 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1936581 |
| Zeros (%) | 79.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 4 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.6519985 |
|---|---|
| Coefficient of variation (CV) | 3.6681656 |
| Kurtosis | 159.62441 |
| Mean | 0.72297676 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.4458696 |
| Sum | 1757015 |
| Variance | 7.0330962 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1936581 | |
| 1 | 194073 | 8.0% |
| 2 | 101566 | 4.2% |
| 3 | 62576 | 2.6% |
| 4 | 34381 | 1.4% |
| 5 | 23066 | 0.9% |
| 6 | 15828 | 0.7% |
| 7 | 11062 | 0.5% |
| 8 | 8783 | 0.4% |
| 9 | 7055 | 0.3% |
| Other values (93) | 35280 | 1.5% |
| Value | Count | Frequency (%) |
| 0 | 1936581 | |
| 1 | 194073 | 8.0% |
| 2 | 101566 | 4.2% |
| 3 | 62576 | 2.6% |
| 4 | 34381 | 1.4% |
| 5 | 23066 | 0.9% |
| 6 | 15828 | 0.7% |
| 7 | 11062 | 0.5% |
| 8 | 8783 | 0.4% |
| 9 | 7055 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 140 | 1 | < 0.1% |
| 136 | 1 | < 0.1% |
| 134 | 1 | < 0.1% |
| 133 | 1 | < 0.1% |
| 120 | 1 | < 0.1% |
| 110 | 7 | |
| 109 | 2 | < 0.1% |
| 107 | 2 | < 0.1% |
| 106 | 2 | < 0.1% |
prev_departure_delay_m
Real number (ℝ)
High correlation  Zeros 
| Distinct | 105 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.76171926 |
| Minimum | 0 |
|---|---|
| Maximum | 159 |
| Zeros | 1880371 |
| Zeros (%) | 77.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 4 |
| Maximum | 159 |
| Range | 159 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 2.682931 |
|---|---|
| Coefficient of variation (CV) | 3.5222045 |
| Kurtosis | 156.02847 |
| Mean | 0.76171926 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.3293082 |
| Sum | 1851169 |
| Variance | 7.1981187 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1880371 | |
| 1 | 233223 | 9.6% |
| 2 | 115673 | 4.8% |
| 3 | 63053 | 2.6% |
| 4 | 35162 | 1.4% |
| 5 | 23343 | 1.0% |
| 6 | 15831 | 0.7% |
| 7 | 11250 | 0.5% |
| 8 | 8895 | 0.4% |
| 9 | 7074 | 0.3% |
| Other values (95) | 36376 | 1.5% |
| Value | Count | Frequency (%) |
| 0 | 1880371 | |
| 1 | 233223 | 9.6% |
| 2 | 115673 | 4.8% |
| 3 | 63053 | 2.6% |
| 4 | 35162 | 1.4% |
| 5 | 23343 | 1.0% |
| 6 | 15831 | 0.7% |
| 7 | 11250 | 0.5% |
| 8 | 8895 | 0.4% |
| 9 | 7074 | 0.3% |
| Value | Count | Frequency (%) |
| 159 | 1 | < 0.1% |
| 137 | 1 | < 0.1% |
| 135 | 1 | < 0.1% |
| 134 | 2 | < 0.1% |
| 132 | 1 | < 0.1% |
| 120 | 1 | < 0.1% |
| 110 | 7 | |
| 109 | 1 | < 0.1% |
| 108 | 1 | < 0.1% |
| 107 | 1 | < 0.1% |
weighted_avg_prev_delay
Real number (ℝ)
High correlation  Zeros 
| Distinct | 44253 |
|---|---|
| Distinct (%) | 1.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.55055954 |
| Minimum | 0 |
|---|---|
| Maximum | 114.66667 |
| Zeros | 1467334 |
| Zeros (%) | 60.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.35 |
| 95-th percentile | 2.6609848 |
| Maximum | 114.66667 |
| Range | 114.66667 |
| Interquartile range (IQR) | 0.35 |
Descriptive statistics
| Standard deviation | 1.8047582 |
|---|---|
| Coefficient of variation (CV) | 3.2780436 |
| Kurtosis | 180.35457 |
| Mean | 0.55055954 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 9.9561236 |
| Sum | 1337997.9 |
| Variance | 3.2571522 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 1467334 | |
| 0.3333333333 | 15538 | 0.6% |
| 0.6666666667 | 12848 | 0.5% |
| 0.2 | 10604 | 0.4% |
| 0.5 | 9779 | 0.4% |
| 0.4 | 9360 | 0.4% |
| 0.2857142857 | 7460 | 0.3% |
| 0.25 | 6667 | 0.3% |
| 1 | 6655 | 0.3% |
| 0.1428571429 | 6276 | 0.3% |
| Other values (44243) | 877730 |
| Value | Count | Frequency (%) |
| 0 | 1467334 | |
| 0.002844950213 | 4 | < 0.1% |
| 0.002898550725 | 1 | < 0.1% |
| 0.003003003003 | 16 | < 0.1% |
| 0.00303030303 | 1 | < 0.1% |
| 0.003171247357 | 1 | < 0.1% |
| 0.003174603175 | 17 | < 0.1% |
| 0.003322259136 | 1 | < 0.1% |
| 0.003361344538 | 17 | < 0.1% |
| 0.003484320557 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 114.6666667 | 1 | |
| 110.0714286 | 1 | |
| 93.76190476 | 1 | |
| 93.33333333 | 1 | |
| 84.61538462 | 1 | |
| 80 | 1 | |
| 78.06666667 | 1 | |
| 77.19047619 | 1 | |
| 74 | 1 | |
| 72.52747253 | 1 |
max_station_number
Real number (ℝ)
High correlation 
| Distinct | 52 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 19.355618 |
| Minimum | 1 |
|---|---|
| Maximum | 59 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 6 |
| Q1 | 12 |
| median | 19 |
| Q3 | 26 |
| 95-th percentile | 33 |
| Maximum | 59 |
| Range | 58 |
| Interquartile range (IQR) | 14 |
Descriptive statistics
| Standard deviation | 8.944886 |
|---|---|
| Coefficient of variation (CV) | 0.46213385 |
| Kurtosis | -0.27620609 |
| Mean | 19.355618 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | 0.25123229 |
| Sum | 47039009 |
| Variance | 80.010986 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 25 | 151807 | 6.2% |
| 28 | 133181 | 5.5% |
| 11 | 115516 | 4.8% |
| 19 | 105387 | 4.3% |
| 15 | 102738 | 4.2% |
| 27 | 100765 | 4.1% |
| 26 | 96620 | 4.0% |
| 12 | 86910 | 3.6% |
| 13 | 84562 | 3.5% |
| 10 | 84223 | 3.5% |
| Other values (42) | 1368542 |
| Value | Count | Frequency (%) |
| 1 | 1404 | 0.1% |
| 2 | 10742 | 0.4% |
| 3 | 18536 | 0.8% |
| 4 | 35893 | |
| 5 | 50933 | |
| 6 | 60169 | |
| 7 | 57799 | |
| 8 | 63896 | |
| 9 | 75768 | |
| 10 | 84223 |
| Value | Count | Frequency (%) |
| 59 | 1916 | 0.1% |
| 54 | 451 | < 0.1% |
| 53 | 845 | < 0.1% |
| 51 | 562 | < 0.1% |
| 50 | 2807 | 0.1% |
| 49 | 41 | < 0.1% |
| 46 | 304 | < 0.1% |
| 45 | 538 | < 0.1% |
| 44 | 3994 | |
| 43 | 8302 |
station_progress
Real number (ℝ)
High correlation 
| Distinct | 847 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.56476904 |
| Minimum | 0.016949153 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 18.5 MiB |
Quantile statistics
| Minimum | 0.016949153 |
|---|---|
| 5-th percentile | 0.125 |
| Q1 | 0.33333333 |
| median | 0.57142857 |
| Q3 | 0.8 |
| 95-th percentile | 1 |
| Maximum | 1 |
| Range | 0.98305085 |
| Interquartile range (IQR) | 0.46666667 |
Descriptive statistics
| Standard deviation | 0.27742442 |
|---|---|
| Coefficient of variation (CV) | 0.49121748 |
| Kurtosis | -1.1682545 |
| Mean | 0.56476904 |
| Median Absolute Deviation (MAD) | 0.23809524 |
| Skewness | -0.048710415 |
| Sum | 1372530.5 |
| Variance | 0.076964311 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 196985 | 8.1% |
| 0.5 | 89285 | 3.7% |
| 0.6666666667 | 63941 | 2.6% |
| 0.3333333333 | 57055 | 2.3% |
| 0.75 | 46247 | 1.9% |
| 0.8 | 41700 | 1.7% |
| 0.6 | 41351 | 1.7% |
| 0.4 | 41266 | 1.7% |
| 0.25 | 37494 | 1.5% |
| 0.2 | 31234 | 1.3% |
| Other values (837) | 1783693 |
| Value | Count | Frequency (%) |
| 0.01694915254 | 31 | < 0.1% |
| 0.01886792453 | 2 | < 0.1% |
| 0.02 | 1 | < 0.1% |
| 0.02173913043 | 1 | < 0.1% |
| 0.02222222222 | 10 | < 0.1% |
| 0.02272727273 | 51 | |
| 0.02325581395 | 14 | < 0.1% |
| 0.02380952381 | 92 | |
| 0.0243902439 | 23 | < 0.1% |
| 0.025 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 1 | 196985 | |
| 0.9830508475 | 33 | < 0.1% |
| 0.9814814815 | 9 | < 0.1% |
| 0.9811320755 | 17 | < 0.1% |
| 0.9803921569 | 12 | < 0.1% |
| 0.98 | 65 | < 0.1% |
| 0.9795918367 | 1 | < 0.1% |
| 0.9782608696 | 6 | < 0.1% |
| 0.9777777778 | 11 | < 0.1% |
| 0.9772727273 | 90 | < 0.1% |
Interactions
Correlations
| IBNR | ID_Base | ID_Timestamp | arrival_delay_m | lat | long | max_station_number | prev_arrival_delay_m | prev_departure_delay_m | station_progress | stop_number | transformed_info_message | weighted_avg_prev_delay | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IBNR | 1.000 | -0.003 | 0.001 | -0.158 | 0.248 | 0.465 | 0.178 | -0.119 | -0.121 | -0.034 | 0.101 | 0.154 | -0.107 |
| ID_Base | -0.003 | 1.000 | -0.001 | -0.001 | 0.002 | 0.003 | -0.004 | 0.000 | 0.001 | 0.000 | -0.002 | 0.014 | 0.000 |
| ID_Timestamp | 0.001 | -0.001 | 1.000 | -0.026 | 0.004 | 0.003 | 0.006 | -0.014 | -0.014 | -0.000 | 0.003 | 0.041 | -0.015 |
| arrival_delay_m | -0.158 | -0.001 | -0.026 | 1.000 | -0.296 | -0.133 | 0.129 | 0.653 | 0.670 | 0.151 | 0.229 | 0.009 | 0.630 |
| lat | 0.248 | 0.002 | 0.004 | -0.296 | 1.000 | 0.214 | -0.003 | -0.191 | -0.202 | -0.012 | -0.013 | 0.212 | -0.194 |
| long | 0.465 | 0.003 | 0.003 | -0.133 | 0.214 | 1.000 | 0.093 | -0.083 | -0.086 | -0.016 | 0.049 | 0.207 | -0.074 |
| max_station_number | 0.178 | -0.004 | 0.006 | 0.129 | -0.003 | 0.093 | 1.000 | 0.176 | 0.155 | -0.138 | 0.572 | 0.125 | 0.280 |
| prev_arrival_delay_m | -0.119 | 0.000 | -0.014 | 0.653 | -0.191 | -0.083 | 0.176 | 1.000 | 0.826 | 0.160 | 0.269 | 0.013 | 0.731 |
| prev_departure_delay_m | -0.121 | 0.001 | -0.014 | 0.670 | -0.202 | -0.086 | 0.155 | 0.826 | 1.000 | 0.141 | 0.236 | 0.012 | 0.650 |
| station_progress | -0.034 | 0.000 | -0.000 | 0.151 | -0.012 | -0.016 | -0.138 | 0.160 | 0.141 | 1.000 | 0.661 | 0.018 | 0.315 |
| stop_number | 0.101 | -0.002 | 0.003 | 0.229 | -0.013 | 0.049 | 0.572 | 0.269 | 0.236 | 0.661 | 1.000 | 0.065 | 0.477 |
| transformed_info_message | 0.154 | 0.014 | 0.041 | 0.009 | 0.212 | 0.207 | 0.125 | 0.013 | 0.012 | 0.018 | 0.065 | 1.000 | 0.013 |
| weighted_avg_prev_delay | -0.107 | 0.000 | -0.015 | 0.630 | -0.194 | -0.074 | 0.280 | 0.731 | 0.650 | 0.315 | 0.477 | 0.013 | 1.000 |
Missing values
Sample
| ID_Base | ID_Timestamp | stop_number | IBNR | long | lat | arrival_plan | departure_plan | arrival_delay_m | transformed_info_message | prev_arrival_delay_m | prev_departure_delay_m | weighted_avg_prev_delay | max_station_number | station_progress | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | -1001326572688500578 | 2407082041 | 2 | 8011118.0 | 13.375988 | 52.509379 | 2024-07-08 20:44:00 | 2024-07-08 20:45:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.285714 |
| 1 | -1001326572688500578 | 2407082041 | 3 | 8011160.0 | 9.095851 | 48.849792 | NaN | NaN | NaN | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.428571 |
| 2 | -1001326572688500578 | 2407082041 | 4 | 8011167.0 | 13.299437 | 52.530276 | 2024-07-08 20:55:00 | 2024-07-08 20:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.571429 |
| 3 | -1001326572688500578 | 2407082041 | 5 | 8010404.0 | 13.196898 | 52.534648 | 2024-07-08 21:00:00 | 2024-07-08 21:03:00 | 2.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.714286 |
| 4 | -1001326572688500578 | 2407082041 | 6 | 8080040.0 | 13.128917 | 52.549396 | 2024-07-08 21:06:00 | 2024-07-08 21:07:00 | 1.0 | No message | 2.0 | 0.0 | 0.666667 | 7 | 0.857143 |
| 5 | -1001326572688500578 | 2407082041 | 7 | 8081586.0 | 13.116810 | 52.552480 | 2024-07-08 21:08:00 | 2024-07-08 21:09:00 | 6.0 | No message | 1.0 | 1.0 | 0.761905 | 7 | 1.000000 |
| 6 | -1001326572688500578 | 2407092041 | 2 | 8011118.0 | 13.375988 | 52.509379 | 2024-07-09 20:44:00 | 2024-07-09 20:45:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.285714 |
| 7 | -1001326572688500578 | 2407092041 | 3 | 8011160.0 | 8.309970 | 54.920783 | NaN | NaN | NaN | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.428571 |
| 8 | -1001326572688500578 | 2407092041 | 4 | 8011167.0 | 13.299437 | 52.530276 | 2024-07-09 20:55:00 | 2024-07-09 20:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.571429 |
| 9 | -1001326572688500578 | 2407092041 | 5 | 8010404.0 | 13.196898 | 52.534648 | 2024-07-09 21:00:00 | 2024-07-09 21:03:00 | 4.0 | No message | 0.0 | 0.0 | 0.000000 | 7 | 0.714286 |
| ID_Base | ID_Timestamp | stop_number | IBNR | long | lat | arrival_plan | departure_plan | arrival_delay_m | transformed_info_message | prev_arrival_delay_m | prev_departure_delay_m | weighted_avg_prev_delay | max_station_number | station_progress | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2430241 | 999976718847540977 | 2407090447 | 6 | 8005649.0 | 7.110814 | 49.274763 | 2024-07-09 05:01:00 | 2024-07-09 05:02:00 | 1.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 1.000000 |
| 2430242 | 999976718847540977 | 2407100447 | 2 | 8005241.0 | 7.018788 | 49.230425 | 2024-07-10 04:50:00 | 2024-07-10 04:51:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.333333 |
| 2430243 | 999976718847540977 | 2407100447 | 3 | 8005306.0 | 7.199622 | 51.177270 | NaN | NaN | NaN | No message | 0.0 | 0.0 | 0.0 | 6 | 0.500000 |
| 2430244 | 999976718847540977 | 2407100447 | 4 | 8005332.0 | 7.057083 | 49.244018 | 2024-07-10 04:55:00 | 2024-07-10 04:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.666667 |
| 2430245 | 999976718847540977 | 2407100447 | 5 | 8005044.0 | 7.004241 | 51.160909 | NaN | NaN | NaN | No message | 0.0 | 0.0 | 0.0 | 6 | 0.833333 |
| 2430246 | 999976718847540977 | 2407100447 | 6 | 8005649.0 | 7.110814 | 49.274763 | 2024-07-10 05:01:00 | 2024-07-10 05:02:00 | 1.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 1.000000 |
| 2430247 | 999976718847540977 | 2407120447 | 2 | 8005241.0 | 7.018788 | 49.230425 | 2024-07-12 04:50:00 | 2024-07-12 04:51:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.333333 |
| 2430248 | 999976718847540977 | 2407120447 | 3 | 8005306.0 | 8.243728 | 50.070788 | NaN | NaN | NaN | No message | 0.0 | 0.0 | 0.0 | 6 | 0.500000 |
| 2430249 | 999976718847540977 | 2407120447 | 4 | 8005332.0 | 7.057083 | 49.244018 | 2024-07-12 04:55:00 | 2024-07-12 04:56:00 | 0.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 0.666667 |
| 2430250 | 999976718847540977 | 2407120447 | 6 | 8005649.0 | 7.110814 | 49.274763 | 2024-07-12 05:01:00 | 2024-07-12 05:02:00 | 5.0 | No message | 0.0 | 0.0 | 0.0 | 6 | 1.000000 |